Skip to content
This repository was archived by the owner on Nov 1, 2024. It is now read-only.

Conversation

wenleix
Copy link
Contributor

@wenleix wenleix commented Jan 17, 2022

Summary:
Quite convenient in the Criteo-DLRM preproc workload (ongoing work in meta-pytorch/torchrec#2), e.g. we can do

df["dense_features"] = (dense_features["dense_features] + 3).log()

where "dense_features" is a nested struct with 13 columns.

API rational: Pandas supports limited number of numeric functions. But for numeric ops Pandas supported (e.g. abs, add), they are both applicable for Series and DataFrame:

>>> import pandas as pd
>>> df = pd.DataFrame({"a": [1, 2, -3], "b": [-4, -5, 6]}

>>> df["a"].abs()
0    1
1    2
2    3
Name: a, dtype: int64
>>> df.abs()
   a  b
0  1  4
1  2  5
2  3  6
>>> df["a"].add(1)
0    2
1    3
2   -2
Name: a, dtype: int64
>>> df.add(1)
   a  b
0  2 -3
1  3 -4
2 -2  7

Differential Revision: D33616632

@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Jan 17, 2022
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D33616632

Summary:
Previously it will silently return None, to reproduce

```
import torcharrow as ta
import torcharrow.dtypes as dt

dtype = dt.Struct(
    [
        dt.Field("labels", dt.int8),
        dt.Field("dense_features", dt.Struct([dt.Field("int_1", dt.int32), dt.Field("int_2", dt.int32)])),
    ]
)

df = ta.DataFrame(
    [
        (1, [0, 1]),     # <-- Should be (1, (0, 1))
        (0, [10, 11])
    ],
    dtype=dtype)
```

Previously it silently return None, and cause error hard to understand ("'NoneType' object has no attribute 'dtype'") when there are a lot of nested fields (e.g. 13/26 nested fields in Criteo Dataset).  Raise with more information to help developer understand the error.

Differential Revision: D33595208

fbshipit-source-id: 21ac06c6ed96a977b0fdd81f4586b631d6e3f0f1
Summary:
Pull Request resolved: pytorch#141

Quite convenient in the Criteo-DLRM preproc workload (ongoing work in meta-pytorch/torchrec#2), e.g. we can do
```
df["dense_features"] = (dense_features["dense_features] + 3).log()
```
where "dense_features" is a nested struct with 13 columns.

API rational: Pandas supports limited number of numeric functions. But for numeric ops Pandas supported (e.g. `abs`, `add`), they are both applicable for Series and DataFrame:
```
>>> import pandas as pd
>>> df = pd.DataFrame({"a": [1, 2, -3], "b": [-4, -5, 6]}

>>> df = pd.DataFrame({"a": [1, 2, -3], "b": [-4, -5, 6]})
>>> df["a"].abs()
0    1
1    2
2    3
Name: a, dtype: int64
>>> df.abs()
   a  b
0  1  4
1  2  5
2  3  6
>>> df["a"].add(1)
0    2
1    3
2   -2
Name: a, dtype: int64
>>> df.add(1)
   a  b
0  2 -3
1  3 -4
2 -2  7
```

Differential Revision: D33616632

fbshipit-source-id: acd424666644b62f7a50ac631d291de5b8786d98
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D33616632

YLGH pushed a commit to YLGH/torcharrow that referenced this pull request May 7, 2022
…edding (pytorch#141)

Summary:
Pull Request resolved: meta-pytorch/torchrec#141

This function can be used for post-training quantization.

Reviewed By: divchenko

Differential Revision: D34471765

fbshipit-source-id: 4f371f2b82daa2aeaef0b420231e3dcec60c1090
facebook-github-bot pushed a commit that referenced this pull request Jun 29, 2022
Summary:
X-link: #141

Quite convenient in the Criteo-DLRM preproc workload (ongoing work in meta-pytorch/torchrec#2), e.g. we can do
```
df["dense_features"] = (dense_features["dense_features"] + 3).log()
```
where "dense_features" is a nested struct with 13 columns.

API rational: Pandas supports limited number of numeric functions. But for numeric ops Pandas supported (e.g. `abs`, `add`), they are both applicable for Series and DataFrame:
```
>>> import pandas as pd
>>> df = pd.DataFrame({"a": [1, 2, -3], "b": [-4, -5, 6]}

>>> df = pd.DataFrame({"a": [1, 2, -3], "b": [-4, -5, 6]})
>>> df["a"].abs()
0    1
1    2
2    3
Name: a, dtype: int64
>>> df.abs()
   a  b
0  1  4
1  2  5
2  3  6
>>> df["a"].add(1)
0    2
1    3
2   -2
Name: a, dtype: int64
>>> df.add(1)
   a  b
0  2 -3
1  3 -4
2 -2  7
```

Reviewed By: vancexu

Differential Revision: D33616632

fbshipit-source-id: e4a08e66ef787b002a418d0c443f1274cbc70569
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants